The following plots and results are generated from a basic random forest model, using all predictors on the training data.
##
## Call:
## randomForest(formula = CARAVAN ~ ., data = training)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 9
##
## OOB estimate of error rate: 6.73%
## Confusion matrix:
## nope yes class.error
## nope 5421 52 0.009501188
## yes 340 8 0.977011494
## [1] "call" "type" "predicted"
## [4] "err.rate" "confusion" "votes"
## [7] "oob.times" "classes" "importance"
## [10] "importanceSD" "localImportance" "proximity"
## [13] "ntree" "mtry" "forest"
## [16] "y" "test" "inbag"
## [19] "terms"
## [1] "mean(rf_model$oob.times/rf_model$ntree"
## [1] 0.3679615
## [1] 0.3678794
## MeanDecreaseGini
## MOSTYPE 19.290540490
## MAANTHUI 2.388730467
## MGEMOMV 5.886996004
## MGEMLEEF 6.896790658
## MOSHOOFD 11.507867142
## MGODRK 6.235641798
## MGODPR 10.873606174
## MGODOV 8.801322521
## MGODGE 11.755542695
## MRELGE 8.924406839
## MRELSA 5.637167130
## MRELOV 8.150525562
## MFALLEEN 8.461649101
## MFGEKIND 10.837577942
## MFWEKIND 11.523120778
## MOPLHOOG 10.404821207
## MOPLMIDD 11.696380075
## MOPLLAAG 11.328675850
## MBERHOOG 9.737253478
## MBERZELF 5.742246103
## MBERBOER 4.031571715
## MBERMIDD 11.600341220
## MBERARBG 10.074542579
## MBERARBO 10.458507053
## MSKA 8.921620325
## MSKB1 9.587316068
## MSKB2 9.800834909
## MSKC 10.728741842
## MSKD 6.429987007
## MHHUUR 10.063712233
## MHKOOP 9.913822851
## MAUT1 8.233167623
## MAUT2 7.736926570
## MAUT0 7.769931354
## MZFONDS 8.531803901
## MZPART 9.156031783
## MINKM30 8.976436772
## MINK3045 10.730039697
## MINK4575 10.034603200
## MINK7512 8.623682580
## MINK123M 2.982119768
## MINKGEM 8.805354253
## MKOOPKLA 12.844685412
## PWAPART 11.256612352
## PWABEDR 1.310853763
## PWALAND 0.326422759
## PPERSAUT 18.034250797
## PBESAUT 0.757110775
## PMOTSCO 3.397750901
## PVRAAUT 0.002807535
## PAANHANG 1.427554422
## PTRACTOR 1.573676725
## PWERKT 0.033542756
## PBROM 3.228235358
## PLEVEN 4.859588962
## PPERSONG 0.125616097
## PGEZONG 1.819836076
## PWAOREG 1.589095053
## PBRAND 19.378158602
## PZEILPL 0.244184310
## PPLEZIER 5.689015941
## PFIETS 3.062686807
## PINBOED 1.197211207
## PBYSTAND 3.952070788
## AWAPART 7.897881479
## AWABEDR 0.896055314
## AWALAND 0.294738376
## APERSAUT 16.785504217
## ABESAUT 0.732845203
## AMOTSCO 3.590657479
## AVRAAUT 0.015673016
## AAANHANG 1.414617318
## ATRACTOR 0.950283089
## AWERKT 0.025350459
## ABROM 2.271710248
## ALEVEN 6.192875806
## APERSONG 0.168557718
## AGEZONG 1.468139436
## AWAOREG 1.677082308
## ABRAND 7.926989124
## AZEILPL 0.279083880
## APLEZIER 5.588107872
## AFIETS 4.109556053
## AINBOED 1.153145837
## ABYSTAND 3.465965014
Results from the predictions on the testing data are below.
## rf_predict
## nope yes
## nope 3729 32
## yes 237 1
## [1] "test-error= 0.0680170042510628"
In comparison with the other models, this particular version of the random forest actually classifies variables as “yes.” A general problem with many of the other models is that, although they have better testing error rates, they never classify testing data as a “yes” outcome. In other words, the false positive rate is nonexistant, but the falst negative result is relatively high.
This model uses the RFcaret package and a repeated cv to adjust the random forest model.
## Random Forest
##
## 5822 samples
## 85 predictor
##
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 4 times)
## Summary of sample sizes: 4657, 4657, 4658, 4658, 4658, 4658, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 1 0.2328000 0.04598831 0.1098930
## 2 0.2330799 0.03550816 0.1083869
## 3 0.2353624 0.03112684 0.1084650
## 4 0.2366904 0.03021162 0.1085802
## 5 0.2372925 0.03087102 0.1086585
## 6 0.2377683 0.03117739 0.1088583
## 7 0.2380816 0.03187034 0.1089887
## 8 0.2381443 0.03295400 0.1091076
## 9 0.2382825 0.03371327 0.1092386
## 10 0.2385889 0.03364026 0.1094089
## 11 0.2388074 0.03387505 0.1094490
## 12 0.2391214 0.03391453 0.1096743
## 13 0.2391256 0.03452624 0.1096910
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 1.
## [1] "method" "modelInfo" "modelType" "results"
## [5] "pred" "bestTune" "call" "dots"
## [9] "metric" "control" "finalModel" "preProcess"
## [13] "trainingData" "resample" "resampledCM" "perfNames"
## [17] "maximize" "yLimits" "times" "levels"
## mtry RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 1 0.2328000 0.04598831 0.1098930 0.011101017 0.01508267 0.004102595
## 2 2 0.2330799 0.03550816 0.1083869 0.010401804 0.01254881 0.004249404
## 3 3 0.2353624 0.03112684 0.1084650 0.009810852 0.01128211 0.004352661
## 4 4 0.2366904 0.03021162 0.1085802 0.009633047 0.01098039 0.004456811
## 5 5 0.2372925 0.03087102 0.1086585 0.009615064 0.01146871 0.004565664
## 6 6 0.2377683 0.03117739 0.1088583 0.009453645 0.01170839 0.004600833
## 7 7 0.2380816 0.03187034 0.1089887 0.009494438 0.01246559 0.004768156
## 8 8 0.2381443 0.03295400 0.1091076 0.009577769 0.01242227 0.004828244
## 9 9 0.2382825 0.03371327 0.1092386 0.009748927 0.01255698 0.004784547
## 10 10 0.2385889 0.03364026 0.1094089 0.009614049 0.01283911 0.005035203
## 11 11 0.2388074 0.03387505 0.1094490 0.009595494 0.01292354 0.004970173
## 12 12 0.2391214 0.03391453 0.1096743 0.009720219 0.01347034 0.005040520
## 13 13 0.2391256 0.03452624 0.1096910 0.009787493 0.01345622 0.005023032
## [1] 1
##
## Call:
## randomForest(x = x, y = y, ntree = 500, mtry = param$mtry)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 1
##
## Mean of squared residuals: 0.05429577
## % Var explained: 3.39
## IncNodePurity
## MOSTYPE 0.743560330
## MAANTHUI 0.166974215
## MGEMOMV 0.391050207
## MGEMLEEF 0.352890420
## MOSHOOFD 0.710967602
## MGODRK 0.297801894
## MGODPR 0.535078567
## MGODOV 0.430770941
## MGODGE 0.577898760
## MRELGE 0.449593568
## MRELSA 0.340270561
## MRELOV 0.430620689
## MFALLEEN 0.435395010
## MFGEKIND 0.497923465
## MFWEKIND 0.551837314
## MOPLHOOG 0.660334862
## MOPLMIDD 0.634607120
## MOPLLAAG 0.815324589
## MBERHOOG 0.572577109
## MBERZELF 0.344828367
## MBERBOER 0.301749635
## MBERMIDD 0.632354343
## MBERARBG 0.644368507
## MBERARBO 0.491811274
## MSKA 0.518556366
## MSKB1 0.490673633
## MSKB2 0.482227654
## MSKC 0.569997722
## MSKD 0.379726342
## MHHUUR 0.549356984
## MHKOOP 0.660330637
## MAUT1 0.488537783
## MAUT2 0.381777996
## MAUT0 0.487725256
## MZFONDS 0.545168105
## MZPART 0.504422297
## MINKM30 0.558929253
## MINK3045 0.542194880
## MINK4575 0.594042967
## MINK7512 0.507489702
## MINK123M 0.192683494
## MINKGEM 0.595335181
## MKOOPKLA 0.711756595
## PWAPART 0.610017974
## PWABEDR 0.125175523
## PWALAND 0.053535567
## PPERSAUT 1.275149806
## PBESAUT 0.068513317
## PMOTSCO 0.198154608
## PVRAAUT 0.003553985
## PAANHANG 0.118903719
## PTRACTOR 0.144823831
## PWERKT 0.009248035
## PBROM 0.194247172
## PLEVEN 0.229260885
## PPERSONG 0.025697951
## PGEZONG 0.234168957
## PWAOREG 0.165447428
## PBRAND 0.897479938
## PZEILPL 0.051455561
## PPLEZIER 0.686684753
## PFIETS 0.195553855
## PINBOED 0.131866004
## PBYSTAND 0.306637901
## AWAPART 0.516916255
## AWABEDR 0.101885921
## AWALAND 0.054104476
## APERSAUT 1.129552998
## ABESAUT 0.043628080
## AMOTSCO 0.159442944
## AVRAAUT 0.004313869
## AAANHANG 0.136451027
## ATRACTOR 0.089428773
## AWERKT 0.007244795
## ABROM 0.143473235
## ALEVEN 0.360119326
## APERSONG 0.023338078
## AGEZONG 0.127028965
## AWAOREG 0.189216313
## ABRAND 0.425363522
## AZEILPL 0.056982155
## APLEZIER 0.599573585
## AFIETS 0.272364777
## AINBOED 0.118262845
## ABYSTAND 0.301024337
Results from the predictions on the testing data are below. This model experiences the false negative problem, although the testing error is improved.
## RFcaret_predict
## 0
## nope 3761
## yes 238
## Random Forest
##
## 5821 samples
## 85 predictor
## 2 classes: 'nope', 'yes'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 4 times)
## Summary of sample sizes: 4657, 4656, 4657, 4657, 4657, 4658, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.9402166 0.00000000
## 43 0.9247559 0.05257694
## 85 0.9214491 0.04739453
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.
## [1] "method" "modelInfo" "modelType" "results"
## [5] "pred" "bestTune" "call" "dots"
## [9] "metric" "control" "finalModel" "preProcess"
## [13] "trainingData" "resample" "resampledCM" "perfNames"
## [17] "maximize" "yLimits" "times" "levels"
## [21] "terms" "coefnames" "xlevels"
## mtry Accuracy Kappa AccuracySD KappaSD
## 1 2 0.9402166 0.00000000 0.0004131887 0.00000000
## 2 43 0.9247559 0.05257694 0.0036133103 0.02620041
## 3 85 0.9214491 0.04739453 0.0041394355 0.02852629
## [1] 2
##
## Call:
## randomForest(x = x, y = y, ntree = 200, mtry = param$mtry)
## Type of random forest: classification
## Number of trees: 200
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 5.98%
## Confusion matrix:
## nope yes class.error
## nope 5473 0 0
## yes 348 0 1
## MeanDecreaseGini
## MOSTYPE 7.460994278
## MAANTHUI 1.367430411
## MGEMOMV 3.441084340
## MGEMLEEF 3.865874161
## MOSHOOFD 5.732377383
## MGODRK 3.111255695
## MGODPR 5.237797377
## MGODOV 3.927947040
## MGODGE 5.158741750
## MRELGE 4.468034659
## MRELSA 2.915211775
## MRELOV 3.910238797
## MFALLEEN 4.312445020
## MFGEKIND 4.844589245
## MFWEKIND 5.525781607
## MOPLHOOG 4.968570133
## MOPLMIDD 5.236016673
## MOPLLAAG 5.534982268
## MBERHOOG 4.777145078
## MBERZELF 2.673033018
## MBERBOER 2.409578543
## MBERMIDD 5.892645710
## MBERARBG 5.099924451
## MBERARBO 4.779756788
## MSKA 4.757661369
## MSKB1 4.885326247
## MSKB2 4.679449883
## MSKC 5.324309212
## MSKD 3.642837757
## MHHUUR 5.416461357
## MHKOOP 5.033022913
## MAUT1 4.729654171
## MAUT2 3.590993827
## MAUT0 3.918990055
## MZFONDS 4.887238960
## MZPART 4.683533914
## MINKM30 4.566165311
## MINK3045 5.241399678
## MINK4575 5.030265239
## MINK7512 4.427956224
## MINK123M 1.734264273
## MINKGEM 4.701371683
## MKOOPKLA 6.063312529
## PWAPART 4.331455165
## PWABEDR 0.535696163
## PWALAND 0.320811176
## PPERSAUT 8.440227748
## PBESAUT 0.294589525
## PMOTSCO 1.449449924
## PVRAAUT 0.006277594
## PAANHANG 0.960757972
## PTRACTOR 1.010132035
## PWERKT 0.017808339
## PBROM 1.052973276
## PLEVEN 2.117389143
## PPERSONG 0.082290395
## PGEZONG 0.996478594
## PWAOREG 0.614634330
## PBRAND 7.582739567
## PZEILPL 0.195487031
## PPLEZIER 3.324950213
## PFIETS 1.199759095
## PINBOED 0.629966191
## PBYSTAND 2.126018970
## AWAPART 3.367936734
## AWABEDR 0.442986635
## AWALAND 0.257595109
## APERSAUT 7.584040013
## ABESAUT 0.339407427
## AMOTSCO 1.209254816
## AVRAAUT 0.007369495
## AAANHANG 0.770375556
## ATRACTOR 0.577837250
## AWERKT 0.040560219
## ABROM 0.807628570
## ALEVEN 2.573087573
## APERSONG 0.152421861
## AGEZONG 0.532863602
## AWAOREG 0.906036760
## ABRAND 2.963367019
## AZEILPL 0.221192549
## APLEZIER 3.342759305
## AFIETS 1.791309078
## AINBOED 0.625594376
## ABYSTAND 1.877529787
Results from the predictions on the testing data are below.
## RFcaret_predict2
## nope yes
## nope 3761 0
## yes 238 0
## [1] "test-error= 0.0595148787196799"
This model uses gbm boosting.
## Random Forest
##
## 5822 samples
## 85 predictor
##
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 4 times)
## Summary of sample sizes: 4657, 4657, 4658, 4658, 4658, 4658, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 1 0.2328000 0.04598831 0.1098930
## 2 0.2330799 0.03550816 0.1083869
## 3 0.2353624 0.03112684 0.1084650
## 4 0.2366904 0.03021162 0.1085802
## 5 0.2372925 0.03087102 0.1086585
## 6 0.2377683 0.03117739 0.1088583
## 7 0.2380816 0.03187034 0.1089887
## 8 0.2381443 0.03295400 0.1091076
## 9 0.2382825 0.03371327 0.1092386
## 10 0.2385889 0.03364026 0.1094089
## 11 0.2388074 0.03387505 0.1094490
## 12 0.2391214 0.03391453 0.1096743
## 13 0.2391256 0.03452624 0.1096910
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 1.
Results from the predictions on the testing data are below. The testing error is the same as the RFcaret random forest model.
## boosting_predict
## 0
## nope 3761
## yes 238
## Accuracy Kappa
## NA NA
## [1] "test-error= 0.0595148787196799"
## Area under the curve: 0.5
This model uses xgboost to perform boosting.
## [1] train-error:0.057207
## [2] train-error:0.057550
## [3] train-error:0.057378
## [4] train-error:0.056863
## [5] train-error:0.057207
Results from the predictions on the testing data are below. The testing error of this model is higher, but it does make predictions in the “yes” category. The testing error is improved over the original random forest model (model 1).
## [1] "Mean relative difference: 12.28571"
## xgboost_predict
## 0 1
## 0 3740 21
## 1 237 1
## [1] "test-error= 0.0645161290322581"
This model uses linear support vector machines for classification.
##
## Call:
## svm(formula = CARAVAN ~ ., data = training, kernel = "linear")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 1
## gamma: 0.01176471
##
## Number of Support Vectors: 2199
##
## Call:
## svm(formula = CARAVAN ~ ., data = training, kernel = "linear")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 1
## gamma: 0.01176471
##
## Number of Support Vectors: 2199
##
## ( 1851 348 )
##
##
## Number of Classes: 2
##
## Levels:
## nope yes
## pred.train
## nope yes
## nope 5473 0
## yes 348 0
## [1] "train-error= 0.0597835423466758"
Results from predictions on the testing data are below. The testing error is the same as the training error and the other models.
## pred.test
## nope yes
## nope 3761 0
## yes 238 0
## [1] "test-error= 0.0595148787196799"
Adjustments to the model were made with tuning, with model summaries displayed.
## cost error dispersion
## 1 3.125e-02 0.05995267 0.009845951
## 2 6.250e-02 0.06012449 0.009797777
## 3 1.250e-01 0.06012449 0.009797777
## 4 2.500e-01 0.06029631 0.009745999
## 5 5.000e-01 0.06012449 0.009797777
## 6 1.000e+00 0.06012449 0.009797777
## 7 2.000e+00 0.06012449 0.009797777
## 8 4.000e+00 0.06029631 0.009745999
## 9 8.000e+00 0.06029631 0.009745999
## 10 1.600e+01 0.06012449 0.009797777
## 11 3.200e+01 0.06012449 0.009797777
## 12 6.400e+01 0.06012449 0.009797777
## 13 1.280e+02 0.05995267 0.009845951
## 14 2.560e+02 0.05995267 0.009845951
## 15 5.120e+02 0.05995267 0.009845951
## 16 1.024e+03 0.05995267 0.009845951
##
## Call:
## best.tune(method = svm, train.x = CARAVAN ~ ., data = training,
## ranges = list(cost = 2^(-5:10)), kernel = "linear")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.03125
## gamma: 0.01176471
##
## Number of Support Vectors: 1535
##
## nope yes
## nope 5473 0
## yes 348 0
## tune_prediction
## nope yes
## nope 3761 0
## yes 238 0
## [1] "test-error= 0.0595148787196799"
The testing error does not improve with tuning.
The following model uses polynomial SVM.
##
## Call:
## svm(formula = CARAVAN ~ ., data = training, kernel = "polynomial",
## degree = 1, gamma = 1, coef0 = 0)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 1
## degree: 1
## gamma: 1
## coef.0: 0
##
## Number of Support Vectors: 2199
##
## Call:
## svm(formula = CARAVAN ~ ., data = training, kernel = "polynomial",
## degree = 1, gamma = 1, coef0 = 0)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 1
## degree: 1
## gamma: 1
## coef.0: 0
##
## Number of Support Vectors: 2199
##
## ( 1851 348 )
##
##
## Number of Classes: 2
##
## Levels:
## nope yes
## pred.train
## nope yes
## nope 5473 0
## yes 348 0
## [1] "train-error= 0.0597835423466758"
Results from predictions on the testing data are below. The testing error is not improved from previous models.
## pred.test
## nope yes
## nope 3761 0
## yes 238 0
## [1] "test-error= 0.0595148787196799"
The following results are from tuning on the polynomial SVM.
## cost error dispersion
## 1 0.03125 0.05995915 0.01305886
## 2 0.06250 0.05995915 0.01305886
## 3 0.12500 0.06013097 0.01307276
## 4 0.25000 0.06013097 0.01307276
## 5 0.50000 0.06013097 0.01307276
## 6 1.00000 0.06013097 0.01307276
## 7 2.00000 0.06047462 0.01331658
## 8 4.00000 0.06047462 0.01331658
## 9 8.00000 0.06047462 0.01331658
## 10 16.00000 0.06013097 0.01307276
## 11 32.00000 0.06030279 0.01318404
##
## Call:
## best.tune(method = svm, train.x = CARAVAN ~ ., data = training,
## ranges = list(cost = 2^(-5:5)), kernel = "polynomial", degree = 1:3,
## gamma = 1, coef0 = 1)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 0.03125
## degree: 1 2 3
## gamma: 1
## coef.0: 1
##
## Number of Support Vectors: 1503
##
## nope yes
## nope 5473 0
## yes 348 0
The following results are from predictions on the testing data. There is no improvement in the testing error.
## pred.test
## nope yes
## nope 3761 0
## yes 238 0
## [1] "test-error= 0.0595148787196799"
The following model uses a radial SVM, with tuning.
## predsvm5.train
## nope yes
## nope 5473 0
## yes 331 17
## [1] "train-error= 0.056863081944683"
##
## Call:
## best.tune(method = svm, train.x = CARAVAN ~ ., data = training,
## ranges = list(cost = 2^(-5:5), gamma = 2^(-5:0)), kernel = "radial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 0.03125
## gamma: 0.03125
##
## Number of Support Vectors: 2010
##
## nope yes
## nope 5473 0
## yes 348 0
## [1] "tune-error= 0.0597835423466758"
Both the tuned and untuned models do not make improvements to the testing error.
## predsvm5.test
## nope yes
## nope 3761 0
## yes 238 0
## predsvm5.tunetest
## nope yes
## nope 3761 0
## yes 238 0
## [1] "test-error= 0.0595148787196799"
## [1] "tuned test-error= 0.0595148787196799"
The following model uses K-means clustering. Scaling leads to significantly different results.
Representitive plots are displayed below, using the first two variables in the training data for both scaled and unscaled data.
## List of 9
## $ cluster : int [1:5821] 2 3 1 1 3 3 2 3 2 1 ...
## $ centers : num [1:3, 1:36] 1.046 0.468 0.645 4.665 5.057 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:3] "1" "2" "3"
## .. ..$ : chr [1:36] "MGODRK" "MGODPR" "MGODOV" "MGODGE" ...
## $ totss : num 636318
## $ withinss : num [1:3] 139945 147064 171156
## $ tot.withinss: num 458164
## $ betweenss : num 178154
## $ size : int [1:3] 1668 2072 2081
## $ iter : int 4
## $ ifault : int 0
## - attr(*, "class")= chr "kmeans"
## List of 9
## $ cluster : int [1:5821] 1 2 1 1 2 3 3 3 3 1 ...
## $ centers : num [1:3, 1:36] 0.2856 0.1014 -0.3801 0.0472 -0.3203 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:3] "1" "2" "3"
## .. ..$ : chr [1:36] "MGODRK" "MGODPR" "MGODOV" "MGODGE" ...
## $ totss : num 209520
## $ withinss : num [1:3] 72559 40655 53492
## $ tot.withinss: num 166706
## $ betweenss : num 42814
## $ size : int [1:3] 2201 1552 2068
## $ iter : int 3
## $ ifault : int 0
## - attr(*, "class")= chr "kmeans"
The following plots display hierarchical clustering on the scaled data.
## Class 'dist' atomic [1:16939110] 7.27 6.28 11.65 8.43 5.04 ...
## ..- attr(*, "Size")= int 5821
## ..- attr(*, "Diag")= logi FALSE
## ..- attr(*, "Upper")= logi FALSE
## ..- attr(*, "method")= chr "euclidean"
## ..- attr(*, "call")= language dist(x = training_u2)
## cluster1
## 1 2 3 4 5
## 4678 703 403 35 2
## cluster2
## 1 2 3 4 5
## 4678 703 403 35 2
## cluster2
## cluster1 1 2 3 4 5
## 1 4678 0 0 0 0
## 2 0 703 0 0 0
## 3 0 0 403 0 0
## 4 0 0 0 35 0
## 5 0 0 0 0 2
The following plots display hierarchical clustering on the unscaled data.
MDS plots were generated using cmdscale. Plots are displayed for both scaled and unscaled data.
## [1] 5821 36
## [1] 2.431997e-15 6.345407e-16